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[I am writing this in the week following the 
break-up of the space shuttle Columbia on re-entry . 
Cur thoughts are with the families ' of Rick Hus- 
band, Michael Anderson, Laurel Clark, David Brown, 
William McCool, Kalpana Chawla and Ilan Ramon.] 

NASA Ames Research Center is one of NASA’s 
oldest centers, having started out as part of the Na- 
tional Advisory Committee on Aeronautics, (NACA). 
The site, about 40 miles south of San Francisco, still 
houses many wind tunnels and other aviation re- 
lated departments. In recent years, with the grow- 
ing realization that space exploration is heavily de- 
pendent on computing and data analysis, its focus 
has turned more towards Information Technology. 
The Computational Sciences Division has expanded 
rapidly as a result. In this article, I will give a brief 
overview of some of the past and present projects 
with a Bayesian content. Much more than is de- 
scribed here goes on with the Division. The web 
pages at http://ic.arc.nasa.gov give more infor- 
mation on these, and the other Division projects. 

AUTO CLASS: Bayesian research at Ames be- 
gan in 1985. The first major project, lead by Pe- 
ter Cheeseman, was AUTOCLASS, a system for per- 
forming unsupervised classification of data, where the 
number and description of the natural classes of the 
data is not known. AUTOCLASS handles missing 
data, mixed real and discrete attributes, and esti- 
mates the posterior probability over a range of model 
structures. It is one of the earliest examples of a re- 
stricted class of Bayes Net system. AUTOCLASS has 
proved extremely useful in practice, and has found 
subtly different classes that were unknown to the in- 
vestigators, as well as many previously known classes 


(but unknown to AutoClass). AUTOCLASS is pub- 
licly available. 

IND: Another early project, lead by Wray Bum- 
tine, was the IND system, which was concerned with 
Bayesian software for supervised classification using 
decision trees. A tree is ” grown” from data using 
a recursive partitioning algorithm to create a tree 
which (hopefully) has good prediction of classes on 
new data. As well as reimplementing parts of some of 
the standard Decision Tree algorithms (e.g. C4) and 
offering experimental control suites, IND also intro- 
duced Bayesian and MML methods and more sophis- 
ticated search in growing trees. These produce more 
accurate class probability estimates that are impor- 
tant in applications like diagnosis 

The approach used in IND has subsequently been 
adapted to learning Bayesian networks from data, 
to learning n-grams for language modeling, and to 
a classification model known as Alternating Deci- 
sion Trees . The data structures and algorithms 
have been quiet influential. Moreover, rumor has it 
that Breiman, an influential Bayesian antagonist, was 
motived by INDs apparent successes to develop the 
Bagging approach to classification trees that subse- 
quently became the empirical champion "in theHeld. 

IND has seen widespread use in empirical and ap- 
plied studies, and is publicly available. 

AUTOBAYES: An ongoing project of general 
applicability in Bayesian analysis is the AutoBayes 
project - Automatic Synthesis of Machine Learning 
Programs from Bayesian Networks, or: BUGS on 
Steroids. 

AutoBayes is an automatic program synthesis sys- 
tem for the machine learning domain under develop- 
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have been developed, tuned to solving diagnosis prob- 
lems. 

Scheduling: New research is applying Bayesian 
techniques to scheduling problems. The domain here 
is one in which a number of tasks must be scheduled, 
given a set of constraints on when the tasks must 
be performed. Completing certain tasks, or subsets 
of the tasks results in a numerical reward. There is 
uncertainty about the duration of the individual tasks 
so the problem becomes one of building the schedule 
that maximizing the expected reward obtainable. 

Autonomous Exploration: This project inves- 
tigated the application of Bayesian statistics to the 
problem of autonomous geological exploration with a 
robotic vehicle. It concentrated on the sub-problem 
of classifying rock types while addressing the is- 
sues associated with operating onboard a mobile 
robot. The Bayesian paradigm was used in a nat- 
ural way to solve the more general robotic prob- 
lems of autonomously profiling an area and allocat- 
ing scarce sensor resources. Major considerations are 
the need to use of multiple sensors and the ability 
of a robotic vehicle to acquire data from different 
locations. Needless sensor use must be curtailed if 
possible, such as when an object is sufficiently well 
identified given sensor data acquired so far. Further- 
more, by investigating rocks in many locations, the 
robot has the opportunity to profile the environment. 
Different rock samples are statistically dependent on 
each other. These dependencies can be exploited to 
substantially improve classification accuracy. 

The classification system was been implemented 
onboard the Nomad robot developed at Carnegie 
Mellon University, and applied to the task of recog- 
nizing meteorites amongst terrestrial rocks in Antarc- 
tica. In Ja nua ry 2000 A.D ., Nomad was deployed 
to Antarctica where it made the first autonomous 
robotic identification of a meteorite. 

Data Analysis 

NASA has been described as a data collection agency 
- each mission returns huge quantities of data, and 
Earth observing satellites return data at such a rate 
that it is difficult to archive, let alone analyze. Nat- 
urally, therefore, there are a number of data analysis 


projects within the Division. 

Planetary Nebula Modeling: Stars like our sun 
end their lives as swollen red giants surrounded by 
cool extended atmospheres. The nuclear reactions in 
their cores create carbon, nitrogen and oxygen, which 
are transported by convection to the outer envelope of 
the stellar atmosphere. As the star finally collapses 
to become a white dwarf, this envelope is expelled 
from the star to form a planetary nebula (PN) rich in 
organic molecules. The physics, dynamics, and chem- 
istry of these nebulae are poorly understood and have 
implications not only for our understanding of the 
stellar life cycle but also for organic astro chemistry 
and the creation of prebiotic molecules in interstellar 
space. 

This project is working toward generating three- 
dimensional models of planetary nebulae, which in- 
clude the size, orientation, shape, expansion rate and 
mass distribution of the nebula, as well as the dis- 
tance from earth. Such a reconstruction of a PN 
is a challenging problem for several reasons. First, 
the data consist of images obtained over time from 
the Hubble Space Telescope and long-slit spectra 
obtained from Kitt Peak National Observatory and 
Cerro Tololo Inter-American Observatory. These im- 
ages are of course taken from a single viewpoint in 
space, which amounts to a very challenging tomo- 
graphic reconstruction. Second, that there are two 
disparate data types requires that we utilize a method 
that allows these data to be used together to obtain 
a solution. Bayesian model estimation is applied us- 
ing a parameterized physical model that incorporates 
much prior information about the known physics of 
the PN. By modeling the nebula in three-dimensions 
it is possible reconcile the observed tangential expan- 
sion observed as an angular size change o f the object 
with the radial expansion velocity determined from 
the Doppler shift in the spectral lines thus providing 
accurate estimates of the objects expansion velocity, 
dynamical age, and distance from earth. 

Event analysis for GLAST: The Gamma Ray 
Large Area Space Telescope is a project to map the 
incidence of gamma rays from the entire sky. It is 
an orbiting telescope, scheduled for launch in 2006. 
It works by converting an incident gamma ray into 
an electron-positron pair in one of a stack of tung- 
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brain. Separating these signals into a set of compo- 
nents each originating from a synchronous ensemble 
has proven to be a very difficult problem. 

The differential variability component analysis 
(dVCA) algorithm relies on a more physiologically re- 
alistic source model that accounts for variability of re- 
sponse amplitude and latency across multiple experi- 
mental trials. Rather than making any unrealistic as- 
sumptions of independence of components, this algo- 
rithm utilizes the differential variability of the evoked 
waveforms to aid their characterization. By applying 
the Bayesian methodology to this new source model, 
we derive an algorithm that uses EEG data simulta- 
neously recorded from multiple electrodes to identify 
multiple "components each representing synchronous 
neuronal activity from an ensemble of neurons dis- 
playing a distinct trial-to-trial variability pattern. In 
addition, this algorithm estimates the single-trial am- 
plitude and latency of each component active during 
any particular evoked response. 

Analysis of Earth Observing Data: A couple 
of projects involved in the analysis of Earth Observ- 
ing Data are the following. 

One project is looking at using naive Bayes classi- 
fiers applied to MODIS (Moderate Resolution Imag- 
ing Spectroradiometer) data for generating a cloud 
mask product. The current methods of generating 
the cloud mask products from MODIS data at the 
DAACs (Distributed Active Archive Centers) are too 
slow to allow for the product to be included in the 
broadcast stream, and so are not used in other data 
products, limiting their accuracy. The goal is to use 
naive Bayes to produce a quick product which could 
be sent out along with the data. 

A second project is looking at the uncertainty 
prese nt in the data pro ducts themselves, many of 
which are derived from the raw satellite observations. 
The derivation of these data products from the obser- 
vations and other data is often via some empirically 
determined relationships (e.g. the production of Leaf 
Area Index maps from Normalised Difference Vege- 
tation Index maps). The Earth Science community 
then uses these derived quantities with little appre- 
ciation of the range of uncertainty present, and the 
effect of that uncertainty on predictions made using 
these derived data products. In this project we are 


analyzing the relationships used to generate certain 
data products, with a view to quantifying the uncer- 
tainty, and making it available together with the data 
product. 

Novel Interfaces: This builds on work done 

on Monte-Carlo methods for mixture modeling. In 
particular a Bayesian approach to the parameteriza- 
tion of Gaussian mixture models, looking at the case 
where the distributions change over time. This work 
will be applied to “virtual keyboards” , where electri- 
cal signals from the muscles in the users forearm are 
captured by dry electrodes on the skin, and decoded 
to recognize the gestures associated with pressing a 
particular key. It will also used to enable a “virtual 
joystick”, used to fly a high-fidelity aircraft simula- 
tor. These models are being developed to augment 
HMM-based models to improve performance. 
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